Introduces API to get inference config, removes unused inference config defaults#890
Conversation
|
This change is part of the following stack: Change managed by git-spice. |
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
Code Review
This pull request introduces a get_inference_config method to both the classifier and regressor, allowing users to access the active configuration before calling fit. It also cleans up the codebase by removing unused V2.6 preprocessor configurations and presets. The review feedback recommends returning a deep copy of the configuration object to prevent accidental mutation of the estimator's internal state and suggests standardizing docstring formatting for better consistency across the documentation.
There was a problem hiding this comment.
Sorry,
I think I previously overlooked something. Can't we now remove _get_tabpfn_v2_6_config: https://github.com/PriorLabs/TabPFN/blob/231de0c/src/tabpfn/inference_config.py#L332-L364?
|
Yep, should be removed with this PR? |
oscarkey
left a comment
There was a problem hiding this comment.
lgtm!
Maybe a unit test for each? 😇
…nfig PriorLabs/TabPFN#890 (which adds get_inference_config) is on TabPFN main but not yet released — every released version (≤ v7.1.1) lacks the method. For those, fall back to the historical hardcoded MAX_NUMBER_OF_CLASSES=10 when the base estimator's class lives in a tabpfn-prefixed module. Other estimators (sklearn etc.) still get the explicit-alphabet ValueError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…282) * Auto-infer alphabet_size in ManyClassClassifier from the base estimator's checkpoint The previous fallback (`estimator.max_num_classes_`) has been dead since the initial commit — TabPFN core has never set that attribute on any version (v2 / v2.5 / v2.6 / v3), so users always had to pass `alphabet_size` explicitly or hit `ValueError`. Now resolution cascades: 1. Explicit `alphabet_size=...` (unchanged) 2. `estimator.inference_config_.MAX_NUMBER_OF_CLASSES` (post-fit) 3. Probe: fit a clone on 4×2 synthetic rows to populate `inference_config_`, then read `MAX_NUMBER_OF_CLASSES` The probe is cheap because TabPFN's `_load_checkpoint_cached` (`@lru_cache`) makes the subsequent codebook fits reuse the already-loaded checkpoint — net I/O is a single ckpt read. Verified with the v2.5, v2.6, and v3 default checkpoints (alphabet auto-resolves to 10, 10, 160 respectively). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Log resolved alphabet_size and base-fit count in ManyClassClassifier.fit Surfaces "Base estimator supports up to N classes; data has M — …" at verbose=1, in both the no-mapping and codebook branches, so users can see how the alphabet was resolved and how many base fits will follow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Strip categorical_features_indices on the probe clone The probe fits on a synthetic 2-feature matrix; any user-provided categorical indices ≥ 2 (e.g. categorical_features_indices=[3]) on the base estimator would trip TabPFN's index-bounds validation before inference_config_ gets populated. Reset them on the clone. Surfaced by codex review on PR #282. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Restore clean ValueError when probe rejects the synthetic input Pre-PR, a non-TabPFN base estimator with no alphabet_size always raised the documented "alphabet_size must be specified ..." ValueError. After the probe was added, that path could surface an arbitrary error from inside the user's estimator instead. Catch ValueError/TypeError around probe.fit (the typical sklearn input-validation errors) and fall through to None so the documented ValueError still fires. Heavier exceptions (RuntimeError, OSError, etc.) still propagate so genuine bugs aren't masked. Surfaced by cursor-bot review on PR #282. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Fix ruff D209 on _probe_alphabet_size docstring Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Use TabPFN's get_inference_config() instead of a probe fit TabPFN exposes a public get_inference_config() method that loads the checkpoint without fit data and returns the active InferenceConfig (honoring any constructor override). Drop the probe fit, the synthetic X_probe / y_probe construction, the categorical_features_indices reset, and the (ValueError, TypeError) catch — all of it collapses into one method call. Suggested by adrian-prior on PR #282. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Default alphabet_size to 10 for older TabPFN without get_inference_config PriorLabs/TabPFN#890 (which adds get_inference_config) is on TabPFN main but not yet released — every released version (≤ v7.1.1) lacks the method. For those, fall back to the historical hardcoded MAX_NUMBER_OF_CLASSES=10 when the base estimator's class lives in a tabpfn-prefixed module. Other estimators (sklearn etc.) still get the explicit-alphabet ValueError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No description provided.